Signature-level Modeling

I am working on two types of signature-level modeling. The first is subsampled models, in which we subsample to every \(100^{th}\) \(x\) location. The second is attempting to use an entire signature and model out the structure of the signature itself.

Subsampled models

I haven’t been able to come up with a clean, reproducible way to do these models yet. However, I have some examples from each barrel of how things vary across the 5-phased models.

Barrel-Land O-1 Results

Random effects for 5-phased subsample signature models for Barrel-Land O-1 (Hamby).
random_effect phase_1 phase_2 phase_3 phase_4 phase_5
operator 0.0425 0.0277 0.0009 0.0235 0.0000
bullet 0.0429 0.0975 0.0980 0.0917 0.0453
machine 0.0000 0.0017 0.0003 0.0072 0.0000
residual 0.7059 0.6837 0.6686 0.7245 0.7760

Barrel-Land P-1 Results

Random effects for 5-phased subsample signature models for Barrel-Land P-1 (Houston).
random_effect phase_1 phase_2 phase_3 phase_4 phase_5
operator 0.0000 0.0000 0.0000 0.000 0.000
bullet 0.0154 0.0931 0.0738 0.000 0.000
machine 0.0000 0.0291 0.0000 0.000 0.000
residual 1.1042 1.1494 1.1861 1.222 1.098

Barrel-Land G-1 Results

Random effects for 5-phased subsample signature models for Barrel-Land G-1 (CSAFE Persistence).
random_effect phase_1 phase_2 phase_3 phase_4 phase_5
operator 0.0000 0.0000 0.0000 0.0187 0.0019
bullet 0.0357 0.0296 0.0287 0.0199 0.0284
machine 0.0000 0.0000 0.0144 0.0169 0.0000
residual 0.5979 0.5716 0.5835 0.6046 0.6051

Full models (iterative structure removal)

One approach to modeling signatures is to iteratively remove elements of the signature which contribute to underlying structure, or signal. R&R studies aim to measure the variability - we can think of this as the noise - surrounding a particular structure. The structure, which we can think of as the signal in this case, is of less interest than the variability patterns surrounding it.

However, lots of elements can contribute to the structure. In our case, some of the study design elements contribute.

Pairwise Modeling

pairing_id generator

The problem:

Pairwise comparisons pipeline in bulletxtrctr uses expand.grid to create a grid of all pairwise comparisons for a set of LEA scans. This results in each pair of LEAs being compared twice. For example:

leas <- c("Land A", "Land B", "Land C")

expand.grid(land1 = leas, land2 = leas, stringsAsFactors = F)
##    land1  land2
## 1 Land A Land A
## 2 Land B Land A
## 3 Land C Land A
## 4 Land A Land B
## 5 Land B Land B
## 6 Land C Land B
## 7 Land A Land C
## 8 Land B Land C
## 9 Land C Land C

So we have “Land B - Land A” compared in row 2, and “Land A - Land B” compared in row 4. These are the same pairwise comparison.

In reality, we only need “Land A - Land B” to be compared ONCE.

For modeling the variability of pairwise comparison scores, allowing each pair to be compared twice could artifically reduce the variability present in scores by duplicating scores and increasing sample sizes.

The solution:

Function make_comparison_grid which takes in a list of LEA scan IDs and returns a data frame with columns land1, land2, and pairing_id. Instead of using expand.grid, it uses the combn function to find all combinations of size 2 within the vector of scan IDs.

make_comparison_grid <- function(scan_ids, self_comparisons = T){
  combos <- combn(scan_ids, 2) %>% t()
  combos_df <- data.frame(land1 = combos[,1], land2 = combos[,2]) 
  if(self_comparisons == T){
    self_combos <- data.frame(land1 = scan_ids, land2 = scan_ids)
    combos_df <- rbind(combos_df, self_combos) %>%
      mutate(pairing_id = paste0(land1, "_", land2))
  }
  else(combos_df <- combos_df %>% 
         mutate(pairing_id = paste0(land1, "_", land2)))
  return(combos_df)
}

make_comparison_grid(scan_ids = leas)
##    land1  land2    pairing_id
## 1 Land A Land B Land A_Land B
## 2 Land A Land C Land A_Land C
## 3 Land B Land C Land B_Land C
## 4 Land A Land A Land A_Land A
## 5 Land B Land B Land B_Land B
## 6 Land C Land C Land C_Land C

You can also choose whether to include self_comparisons, which would be comparing “Land A - Land A”, etc. This is the equivalent to including the diagonal in a grid of comparisons. The default is to include these comparisons.

make_comparison_grid(scan_ids = leas, self_comparisons = F)
##    land1  land2    pairing_id
## 1 Land A Land B Land A_Land B
## 2 Land A Land C Land A_Land C
## 3 Land B Land C Land B_Land C

This is a relatively simple solution, but will significantly reduce computation time when completing pairwise comparisons because it reduces the required number of comparisons from \(n^2\) to \(\frac{n(n+1)}{2}\) - nearly in half. An example of how this would update the comparison of two bullets is seen below:

You still get all the important comparisons, but you have about half the computing time required to actually get to all those RFscores!

Another thing to note is that even if you only calculate this reduced number of comparisons, you can still generate the visuals for the “full grid” by re-generating the alternative pairing ID’s. You can generate an alternate version, using NA values when it is a self-comparison (i.e., a scan to itself). Then once the alternate name is generated, you can gather the two names, which fills in the RFscore values for both versions of the name. Thus, you can plot the whole grid! The function is below.

expand_grid_visual <- function(pairing_id, rfscore){
  comp_grid <- data.frame(pairing_id = pairing_id, rfscore = rfscore)
  comp_grid <- comp_grid %>% mutate(pairing_id_alt = purrr::map_chr(as.character(pairing_id), .f = function(pairing_id){
    land1 = strsplit(pairing_id, "_")[[1]][1]
    land2 = strsplit(pairing_id, "_")[[1]][2]
    
    alt_id = ifelse(land1 == land2, NA, paste0(land2, "_", land1))
    return(alt_id)
  }))
  
  comp_grid_long <- 
    comp_grid %>% 
    mutate(pairing_id = as.character(pairing_id)) %>%
    gather(c(pairing_id, pairing_id_alt), 
           key = "id_method", value = "id_value") %>%
    mutate(id_dummy = id_value) %>%
    separate(col = id_dummy, into = c("land1", "land2"), sep = "_") %>%
    filter(!is.na(id_value))
  return(comp_grid_long)
}

A plot of the reduced and then back-expanded RFscores is below:

How does this change modeling??

## Linear mixed model fit by REML ['lmerMod']
## Formula: rfscore ~ unique_id_l1 + (1 | bullet_gt/unique_id_l1) + (1 |  
##     operator_gt) + (1 | machine_gt) + (1 | operator_gt:bullet_gt) +  
##     (1 | operator_gt:machine_gt) + (1 | bullet_gt:machine_gt)
##    Data: orange_pairs
## REML criterion at convergence: -38198.81
## Random effects:
##  Groups                 Name        Std.Dev. 
##  unique_id_l1:bullet_gt (Intercept) 0.1310007
##  bullet_gt:machine_gt   (Intercept) 0.0009005
##  operator_gt:machine_gt (Intercept) 0.0017356
##  operator_gt:bullet_gt  (Intercept) 0.0137432
##  machine_gt             (Intercept) 0.0018698
##  operator_gt            (Intercept) 0.0049594
##  bullet_gt              (Intercept) 0.0976507
##  Residual                           0.1631489
## Number of obs: 48606, groups:  
## unique_id_l1:bullet_gt, 12; bullet_gt:machine_gt, 4; operator_gt:machine_gt, 4; operator_gt:bullet_gt, 4; machine_gt, 2; operator_gt, 2; bullet_gt, 2
## Fixed Effects:
##        (Intercept)  unique_id_l1BL O-2  unique_id_l1BL O-3  
##            0.87994             0.11334             0.08562  
## unique_id_l1BL O-4  unique_id_l1BL O-5  unique_id_l1BL O-6  
##           -0.01999             0.10679            -0.39001  
## convergence code 0; 1 optimizer warnings; 0 lme4 warnings
## Linear mixed model fit by REML ['lmerMod']
## Formula: rfscore ~ unique_id_l1 + (1 | bullet_gt/unique_id_l1) + (1 |  
##     operator_gt) + (1 | machine_gt) + (1 | operator_gt:bullet_gt) +  
##     (1 | operator_gt:machine_gt) + (1 | bullet_gt:machine_gt)
##    Data: orange_pairs_reduced
## REML criterion at convergence: -19423.73
## Random effects:
##  Groups                 Name        Std.Dev.
##  unique_id_l1:bullet_gt (Intercept) 0.134045
##  bullet_gt:machine_gt   (Intercept) 0.001665
##  operator_gt:machine_gt (Intercept) 0.003021
##  operator_gt:bullet_gt  (Intercept) 0.016711
##  machine_gt             (Intercept) 0.002516
##  operator_gt            (Intercept) 0.001960
##  bullet_gt              (Intercept) 0.096396
##  Residual                           0.162611
## Number of obs: 24573, groups:  
## unique_id_l1:bullet_gt, 12; bullet_gt:machine_gt, 4; operator_gt:machine_gt, 4; operator_gt:bullet_gt, 4; machine_gt, 2; operator_gt, 2; bullet_gt, 2
## Fixed Effects:
##        (Intercept)  unique_id_l1BL O-2  unique_id_l1BL O-3  
##            0.88139             0.11221             0.08442  
## unique_id_l1BL O-4  unique_id_l1BL O-5  unique_id_l1BL O-6  
##           -0.02105             0.10573            -0.38681  
## convergence code 0; 1 optimizer warnings; 0 lme4 warnings